Publish Date Numerical Distribution

The heatmap shows that most videos were published in the Evening timeframe (Noon - 8pm) with most being published on a Tuesday or Sunday evening. Within the morning hours, most videos were published on Friday and the least on Thursday or Saturday morning. For the 'graveyard' hours, most trending videos were published Monday night. One could infer that publishing a video Tuesday evening and Sunday evening will give a creator the best chance to get a trending video.

The box plot shows that trending videos normally have between 500k and 2.3M views. Videos with over 5M views tend to be an exception to the rule. We can posit that creators need their video to gain at least 500k views before it appears on the Trending Videos page.

The numerical distributions show that trending videos gain more likes than comments or dislikes. The scale of each engagement method is very different. The median amount of likes is 52K while the median dislikes is 0.8K. Comments sit slightly above dislikes at 3.3K. These medians show that creators need high like and comment counts with low dislikes to become trending.

Correlation Matrix of Numerical Columns

The data generally shows a positive correlation. Some stronger correlations seem to exist between comment_count and likes. The view_count vs likes graph in the top row seems to suggest a lower bound of people viewing the video as the number of likes increase. Also in the top row, the view_count vs comment_count graph shows two characteristics. There seems to be one set of videos that gain high views with little to no comments as can be seen with the cluster of points along the y-axis. The second characteristic is the positive correlation of comments as the number of views increases.

The US seems to dominate views on trending videos until sometime in March where Canada starts to become the dominant source. Great Britain, for the most part, sees less views than either country. Also, Great Britain seems more closely synced with Canadian viewers than US view as can be seen in the Sep. 16th spike and July 14th / July 15th spikes. US viewers seem to have different tastes than Canadian + Great Britain viewers. Canadians also have the largest viewership spike on any given day at 1.19B views on July 4th.

All I can say is that this video somehow broke the Canadian YT viewership. It's kinda funny but mostly dumb. Humor cannot be explained :(

PCA of Categories from User Engagement

The PCA attempts to show what a 4D graph of user engagement points would look like in 2D. If we were to try and predict a given trending video to a category, we could probably do it for the some of the Music category, but most of the other categories are overlapping. This shows that there isn't enough information to "accurately" categorize a given trending video to a category based on user engagement alone.

PCA of User Engagement w/o the Music Category

By removing the Music category, we get a closer look at the concentration of the other categories. As we posited above, there doesn't seem to be a clear separation of categories from user engagement alone to attempt building a classifier that automatically categorized trending videos by category.

An 89% retention of information shows that the PCA was able to keep 89% of the original variance present in the original 4D graph. With the amount of overlap, it seems doubtful that the 11% lost could help in separating the categories.